freebayes¶

freebayes_cram · 2 contributors · 1 version

usage: freebayes [OPTION] … [BAM FILE] … Bayesian haplotype-based polymorphism discovery. Version:1.3.1

Quickstart¶

from janis_bioinformatics.tools.freebayes.versions import FreeBayesCram_1_3

wf = WorkflowBuilder("myworkflow")

wf.step(
    "freebayes_cram_step",
    FreeBayesCram_1_3(
        bams=None,
        reference=None,
        ploidy=None,
        refQual=None,
        maxNumOfAlleles=None,
        haplotypeLength=None,
        minRepSize=None,
        minRepEntropy=None,
        useDupFlag=None,
        minMappingQual=None,
        minBaseQual=None,
        minSupQsum=None,
        minSupMQsum=None,
        minSupBQthres=None,
        minAltCount=None,
        minAltQSum=None,
        minAltTotal=None,
        minCov=None,
        genotypingMaxIter=None,
        genotypingMaxBDepth=None,
        postIntegrationLim=None,
    )
)
wf.output("out", source=freebayes_cram_step.out)

OR

Install Janis
Ensure Janis is configured to work with Docker or Singularity.
Ensure all reference files are available:

Note

More information about these inputs are available below.

Generate user input files for freebayes_cram:

# user inputs
janis inputs freebayes_cram > inputs.yaml

inputs.yaml

bams:
- bams_0.cram
- bams_1.cram
reference: reference.fasta

Run freebayes_cram with:

janis run [...run options] \
    --inputs inputs.yaml \
    freebayes_cram

Information¶

ID:	`freebayes_cram`
URL:	https://github.com/ekg/freebayes
Versions:	1.3.1
Container:	shollizeck/freebayes:1.3.1
Authors:	Sebastian Hollizeck, Michael Franklin
Citations:	Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012
Created:	2019-10-19
Updated:	2019-10-19

Outputs¶

name	type	documentation
out	VCF

Additional configuration (inputs)¶

name	type	prefix	documentation
bams	Array<CramPair>	-b	Add FILE to the set of BAM files to be analyzed.
reference	FastaFai	-f	Use FILE as the reference sequence for analysis. An index file (FILE.fai) will be created if none exists. If neither –targets nor –region are specified, FreeBayes will analyze every position in this reference.
ploidy	Integer	-p	Sets the default ploidy for the analysis to N. default: 2
refQual	String	–reference-quality	–reference-quality MQ,BQ Assign mapping quality of MQ to the reference allele at each site and base quality of BQ. default: 100,60
maxNumOfAlleles	Integer	-n	Evaluate only the best N SNP alleles, ranked by sum of supporting quality scores. (Set to 0 to use all; default: all)
haplotypeLength	Integer	–haplotype-length	Allow haplotype calls with contiguous embedded matches of up to this length. Set N=-1 to disable clumping. (default: 3)
minRepSize	Integer	–min-repeat-size	When assembling observations across repeats, require the total repeat length at least this many bp. (default: 5)
minRepEntropy	Integer	–min-repeat-entropy	To detect interrupted repeats, build across sequence until it has entropy > N bits per bp. Set to 0 to turn off. (default: 1)
useDupFlag	Boolean	-4	Include duplicate-marked alignments in the analysis. default: exclude duplicates marked as such in alignments
minMappingQual	Integer	-m	Exclude alignments from analysis if they have a mapping quality less than Q. default: 1
minBaseQual	Integer	-q	-q –min-base-quality Q Exclude alleles from analysis if their supporting base quality is less than Q. default: 0
minSupQsum	Integer	-R	-R –min-supporting-allele-qsum Q Consider any allele in which the sum of qualities of supporting observations is at least Q. default: 0
minSupMQsum	Integer	-Y	-Y –min-supporting-mapping-qsum Q Consider any allele in which and the sum of mapping qualities of supporting reads is at least Q. default: 0
minSupBQthres	Integer	-Q	-Q –mismatch-base-quality-threshold Q Count mismatches toward –read-mismatch-limit if the base quality of the mismatch is >= Q. default: 10
minAltCount	Integer	-C	-C –min-alternate-count N Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position. default: 2
minAltQSum	Integer	-3	-3 –min-alternate-qsum N Require at least this sum of quality of observations supporting an alternate allele within a single individual in order to evaluate the position. default: 0
minAltTotal	Integer	-G	-G –min-alternate-total N Require at least this count of observations supporting an alternate allele within the total population in order to use the allele in analysis. default: 1
minCov	Integer	–min-coverage	–min-coverage N Require at least this coverage to process a site. default: 0
genotypingMaxIter	Integer	-B	-B –genotyping-max-iterations N Iterate no more than N times during genotyping step. default: 1000.
genotypingMaxBDepth	Integer	–genotyping-max-banddepth	–genotyping-max-banddepth N Integrate no deeper than the Nth best genotype by likelihood when genotyping. default: 6.
postIntegrationLim	String	-W	-W –posterior-integration-limits N,M Integrate all genotype combinations in our posterior space which include no more than N samples with their Mth best data likelihood. default: 1,3.
bamList	Optional<TextFile>	-L	A file containing a list of BAM files to be analyzed.
targetsFile	Optional<bed>	-t	Limit analysis to targets listed in the BED-format FILE.
region	Optional<String>	-r	<chrom>:<start_position>-<end_position> Limit analysis to the specified region, 0-base coordinates, end_position not included (same as BED format). Either ‘-’ or ‘..’ maybe used as a separator.
samplesFile	Optional<TextFile>	-s	FILE Limit analysis to samples listed (one per line) in the FILE. By default FreeBayes will analyze all samples in its input BAM files.
popFile	Optional<TextFile>	–populations	FILE Each line of FILE should list a sample and a population which it is part of. The population-based bayesian inference model will then be partitioned on the basis of the populations.
cnvFile	Optional<TextFile>	-A	FILE Read a copy number map from the BED file FILE, which has either a sample-level ploidy: sample name, copy number or a region-specific format: reference sequence, start, end, sample name, copy number … for each region in each sample which does not have the default copy number as set by –ploidy.
outputFilename	Optional<Filename>	-v	FILE Output VCF-format results to FILE. (default: stdout)
gvcfFlag	Optional<Boolean>	–gvcf	Write gVCF output, which indicates coverage in uncalled regions.
gvcfChunkSize	Optional<Integer>	–gvcf-chunk	When writing gVCF output emit a record for every NUM bases.
candidateVcf	Optional<File>	-@	Use variants reported in VCF file as input to the algorithm. Variants in this file will included in the output even if there is not enough support in the data to pass input filters.
restrictSitesFlag	Optional<Boolean>	-l	Only provide variant calls and genotype likelihoods for sites and alleles which are provided in the VCF input, and provide output in the VCF for all input alleles, not just those which have support in the data.
candidateHaploVcf	Optional<File>	–haplotype-basis-alleles	When specified, only variant alleles provided in this input VCF will be used for the construction of complex or haplotype alleles.
reportHapAllelesFlag	Optional<Boolean>	–report-all-haplotype-alleles	At sites where genotypes are made over haplotype alleles, provide information about all alleles in output, not only those which are called.
monomorphicFlag	Optional<Boolean>	–report-monomorphic	Report even loci which appear to be monomorphic, and report all considered alleles, even those which are not in called genotypes. Loci which do not have any potential alternates have ‘.’ for ALT.
polyMoprhProbFlag	Optional<Float>	-P	Report sites if the probability that there is a polymorphism at the site is greater than N. default: 0.0. Note that post-filtering is generally recommended over the use of this parameter.
strictFlag	Optional<Boolean>	–strict-vcf	Generate strict VCF format (FORMAT/GQ will be an int)
theta	Optional<Float>	-T	The expected mutation rate or pairwise nucleotide diversity among the population under analysis. This serves as the single parameter to the Ewens Sampling Formula prior model default: 0.001
pooledDiscreteFlag	Optional<Boolean>	-J	Assume that samples result from pooled sequencing. Model pooled samples using discrete genotypes across pools. When using this flag, set –ploidy to the number of alleles in each sample or use the –cnv-map to define per-sample ploidy.
pooledContinousFlag	Optional<Boolean>	-K	Output all alleles which pass input filters, regardles of genotyping outcome or model.
addRefFlag	Optional<Boolean>	-Z	This flag includes the reference allele in the analysis as if it is another sample from the same population.
ignoreSNPsFlag	Optional<Boolean>	-I	Ignore SNP alleles.
ignoreINDELsFlag	Optional<Boolean>	-i	Ignore insertion and deletion alleles.
ignoreMNPsFlag	Optional<Boolean>	-X	Ignore multi-nuceotide polymorphisms, MNPs.
ignoreComplexVarsFlag	Optional<Boolean>	-u	Ignore complex events (composites of other classes).
maxNumOfComplexVars	Optional<Integer>	-E
noPartObsFlag	Optional<Boolean>	–no-partial-observations	Exclude observations which do not fully span the dynamically-determined detection window. (default, use all observations, dividing partial support across matching haplotypes when generating haplotypes.)
noNormaliseFlag	Optional<Boolean>	-O	Turn off left-alignment of indels, which is enabled by default.
readMisMatchLim	Optional<Integer>	-U	-U –read-mismatch-limit N Exclude reads with more than N mismatches where each mismatch has base quality >= mismatch-base-quality-threshold. default: ~unbounded
maxMisMatchFrac	Optional<Float>	-z	-z –read-max-mismatch-fraction N Exclude reads with more than N [0,1] fraction of mismatches where each mismatch has base quality >= mismatch-base-quality-threshold default: 1.0
readSNPLim	Optional<Integer>	-$	-$ –read-snp-limit N Exclude reads with more than N base mismatches, ignoring gaps with quality >= mismatch-base-quality-threshold. default: ~unbounded
readINDELLim	Optional<Integer>	-e	-e –read-indel-limit N Exclude reads with more than N separate gaps. default: ~unbounded
standardFilterFlag	Optional<Boolean>	-0	-0 –standard-filters Use stringent input base and mapping quality filters Equivalent to -m 30 -q 20 -R 0 -S 0
minAltFrac	Optional<Float>	-F	-F –min-alternate-fraction N Require at least this fraction of observations supporting an alternate allele within a single individual in the in order to evaluate the position. default: 0.05
maxCov	Optional<Integer>	–limit-coverage	Downsample per-sample coverage to this level if greater than this coverage. default: no limit
noPopPriorsFlag	Optional<Boolean>	-k	-k –no-population-priors Equivalent to –pooled-discrete –hwe-priors-off and removal of Ewens Sampling Formula component of priors.
noHWEPriorsFlag	Optional<Boolean>	-w	-w –hwe-priors-off Disable estimation of the probability of the combination arising under HWE given the allele frequency as estimated by observation frequency.
noBinOBSPriorsFlag	Optional<Boolean>	-V	-V –binomial-obs-priors-off Disable incorporation of prior expectations about observations. Uses read placement probability, strand balance probability, and read position (5’-3’) probability.
noABPriorsFlag	Optional<Boolean>	-a	-a –allele-balance-priors-off Disable use of aggregate probability of observation balance between alleles as a component of the priors.
obsBiasFile	Optional<TextFile>	–observation-bias	–observation-bias FILE Read length-dependent allele observation biases from FILE. The format is [length] [alignment efficiency relative to reference] where the efficiency is 1 if there is no relative observation bias.
baseQualCap	Optional<Integer>	–base-quality-cap	–base-quality-cap Q Limit estimated observation quality by capping base quality at Q.
probContamin	Optional<Float>	–prob-contamination	–prob-contamination F An estimate of contamination to use for all samples. default: 10e-9
legGLScalc	Optional<Boolean>	–legacy-gls	–legacy-gls Use legacy (polybayes equivalent) genotype likelihood calculations
contaminEst	Optional<TextFile>	–contamination-estimates	–contamination-estimates FILE A file containing per-sample estimates of contamination, such as those generated by VerifyBamID. The format should be: sample p(read=R\|genotype=AR) p(read=A\|genotype=AA) Sample ‘*’ can be used to set default contamination estimates.
reportMaxGLFlag	Optional<Boolean>	–report-genotype-likelihood-max	–report-genotype-likelihood-max Report genotypes using the maximum-likelihood estimate provided from genotype likelihoods.
excludeUnObsGT	Optional<Boolean>	-N	-N –exclude-unobserved-genotypes Skip sample genotypings for which the sample has no supporting reads.
gtVarThres	Optional<Integer>	-S	-S –genotype-variant-threshold N Limit posterior integration to samples where the second-best genotype likelihood is no more than log(N) from the highest genotype likelihood for the sample. default: ~unbounded
useMQFlag	Optional<Boolean>	-j	-j –use-mapping-quality Use mapping quality of alleles when calculating data likelihoods.
harmIndelQualFlag	Optional<Boolean>	-H	-H –harmonic-indel-quality Use a weighted sum of base qualities around an indel, scaled by the distance from the indel. By default use a minimum BQ in flanking sequence.
readDepFact	Optional<Float>	-D	-D –read-dependence-factor N Incorporate non-independence of reads by scaling successive observations by this factor during data likelihood calculations. default: 0.9
gtQuals	Optional<Boolean>	-=	-= –genotype-qualities Calculate the marginal probability of genotypes and report as GQ in each sample field in the VCF output.
skipCov	Optional<Integer>	–skip-coverage	Skip processing of alignments overlapping positions with coverage >N. This filters sites above this coverage, but will also reduce data nearby. default: no limit

Workflow Description Language¶

version development

task freebayes_cram {
  input {
    Int? runtime_cpu
    Int? runtime_memory
    Int? runtime_seconds
    Int? runtime_disks
    Array[File] bams
    Array[File] bams_crai
    File? bamList
    File reference
    File reference_fai
    File? targetsFile
    String? region
    File? samplesFile
    File? popFile
    File? cnvFile
    String? outputFilename
    Boolean? gvcfFlag
    Int? gvcfChunkSize
    File? candidateVcf
    Boolean? restrictSitesFlag
    File? candidateHaploVcf
    Boolean? reportHapAllelesFlag
    Boolean? monomorphicFlag
    Float? polyMoprhProbFlag
    Boolean? strictFlag
    Float? theta
    Int? ploidy
    Boolean? pooledDiscreteFlag
    Boolean? pooledContinousFlag
    Boolean? addRefFlag
    String? refQual
    Boolean? ignoreSNPsFlag
    Boolean? ignoreINDELsFlag
    Boolean? ignoreMNPsFlag
    Boolean? ignoreComplexVarsFlag
    Int? maxNumOfAlleles
    Int? maxNumOfComplexVars
    Int? haplotypeLength
    Int? minRepSize
    Int? minRepEntropy
    Boolean? noPartObsFlag
    Boolean? noNormaliseFlag
    Boolean? useDupFlag
    Int? minMappingQual
    Int? minBaseQual
    Int? minSupQsum
    Int? minSupMQsum
    Int? minSupBQthres
    Int? readMisMatchLim
    Float? maxMisMatchFrac
    Int? readSNPLim
    Int? readINDELLim
    Boolean? standardFilterFlag
    Float? minAltFrac
    Int? minAltCount
    Int? minAltQSum
    Int? minAltTotal
    Int? minCov
    Int? maxCov
    Boolean? noPopPriorsFlag
    Boolean? noHWEPriorsFlag
    Boolean? noBinOBSPriorsFlag
    Boolean? noABPriorsFlag
    File? obsBiasFile
    Int? baseQualCap
    Float? probContamin
    Boolean? legGLScalc
    File? contaminEst
    Boolean? reportMaxGLFlag
    Int? genotypingMaxIter
    Int? genotypingMaxBDepth
    String? postIntegrationLim
    Boolean? excludeUnObsGT
    Int? gtVarThres
    Boolean? useMQFlag
    Boolean? harmIndelQualFlag
    Float? readDepFact
    Boolean? gtQuals
    Int? skipCov
  }
  command <<<
    set -e
    freebayes \
      ~{if length(bams) > 0 then "-b '" + sep("' -b '", bams) + "'" else ""} \
      ~{if defined(bamList) then ("-L '" + bamList + "'") else ""} \
      -f '~{reference}' \
      ~{if defined(targetsFile) then ("-t '" + targetsFile + "'") else ""} \
      ~{if defined(region) then ("-r '" + region + "'") else ""} \
      ~{if defined(samplesFile) then ("-s '" + samplesFile + "'") else ""} \
      ~{if defined(popFile) then ("--populations '" + popFile + "'") else ""} \
      ~{if defined(cnvFile) then ("-A '" + cnvFile + "'") else ""} \
      -v '~{select_first([outputFilename, "generated.vcf"])}' \
      ~{if select_first([gvcfFlag, false]) then "--gvcf" else ""} \
      ~{if defined(gvcfChunkSize) then ("--gvcf-chunk " + gvcfChunkSize) else ''} \
      ~{if defined(candidateVcf) then ("-@ '" + candidateVcf + "'") else ""} \
      ~{if (defined(restrictSitesFlag) && select_first([restrictSitesFlag])) then "-l" else ""} \
      ~{if defined(candidateHaploVcf) then ("--haplotype-basis-alleles '" + candidateHaploVcf + "'") else ""} \
      ~{if (defined(reportHapAllelesFlag) && select_first([reportHapAllelesFlag])) then "--report-all-haplotype-alleles" else ""} \
      ~{if (defined(monomorphicFlag) && select_first([monomorphicFlag])) then "--report-monomorphic" else ""} \
      ~{if defined(polyMoprhProbFlag) then ("-P " + polyMoprhProbFlag) else ''} \
      ~{if (defined(strictFlag) && select_first([strictFlag])) then "--strict-vcf" else ""} \
      ~{if defined(theta) then ("-T " + theta) else ''} \
      -p ~{select_first([ploidy, 2])} \
      ~{if (defined(pooledDiscreteFlag) && select_first([pooledDiscreteFlag])) then "-J" else ""} \
      ~{if (defined(pooledContinousFlag) && select_first([pooledContinousFlag])) then "-K" else ""} \
      ~{if (defined(addRefFlag) && select_first([addRefFlag])) then "-Z" else ""} \
      --reference-quality '~{select_first([refQual, "100,60"])}' \
      ~{if (defined(ignoreSNPsFlag) && select_first([ignoreSNPsFlag])) then "-I" else ""} \
      ~{if (defined(ignoreINDELsFlag) && select_first([ignoreINDELsFlag])) then "-i" else ""} \
      ~{if (defined(ignoreMNPsFlag) && select_first([ignoreMNPsFlag])) then "-X" else ""} \
      ~{if (defined(ignoreComplexVarsFlag) && select_first([ignoreComplexVarsFlag])) then "-u" else ""} \
      -n ~{select_first([maxNumOfAlleles, 0])} \
      ~{if defined(maxNumOfComplexVars) then ("-E " + maxNumOfComplexVars) else ''} \
      --haplotype-length ~{select_first([haplotypeLength, 3])} \
      --min-repeat-size ~{select_first([minRepSize, 5])} \
      --min-repeat-entropy ~{select_first([minRepEntropy, 1])} \
      ~{if (defined(noPartObsFlag) && select_first([noPartObsFlag])) then "--no-partial-observations" else ""} \
      ~{if (defined(noNormaliseFlag) && select_first([noNormaliseFlag])) then "-O" else ""} \
      ~{if select_first([useDupFlag, false]) then "-4" else ""} \
      -m ~{select_first([minMappingQual, 1])} \
      -q ~{select_first([minBaseQual, 0])} \
      -R ~{select_first([minSupQsum, 0])} \
      -Y ~{select_first([minSupMQsum, 0])} \
      -Q ~{select_first([minSupBQthres, 10])} \
      ~{if defined(readMisMatchLim) then ("-U " + readMisMatchLim) else ''} \
      ~{if defined(maxMisMatchFrac) then ("-z " + maxMisMatchFrac) else ''} \
      ~{if defined(readSNPLim) then ("-$ " + readSNPLim) else ''} \
      ~{if defined(readINDELLim) then ("-e " + readINDELLim) else ''} \
      ~{if (defined(standardFilterFlag) && select_first([standardFilterFlag])) then "-0" else ""} \
      ~{if defined(minAltFrac) then ("-F " + minAltFrac) else ''} \
      -C ~{select_first([minAltCount, 2])} \
      -3 ~{select_first([minAltQSum, 0])} \
      -G ~{select_first([minAltTotal, 1])} \
      --min-coverage ~{select_first([minCov, 0])} \
      ~{if defined(maxCov) then ("--limit-coverage " + maxCov) else ''} \
      ~{if (defined(noPopPriorsFlag) && select_first([noPopPriorsFlag])) then "-k" else ""} \
      ~{if (defined(noHWEPriorsFlag) && select_first([noHWEPriorsFlag])) then "-w" else ""} \
      ~{if (defined(noBinOBSPriorsFlag) && select_first([noBinOBSPriorsFlag])) then "-V" else ""} \
      ~{if (defined(noABPriorsFlag) && select_first([noABPriorsFlag])) then "-a" else ""} \
      ~{if defined(obsBiasFile) then ("--observation-bias '" + obsBiasFile + "'") else ""} \
      ~{if defined(baseQualCap) then ("--base-quality-cap " + baseQualCap) else ''} \
      ~{if defined(probContamin) then ("--prob-contamination " + probContamin) else ''} \
      ~{if (defined(legGLScalc) && select_first([legGLScalc])) then "--legacy-gls" else ""} \
      ~{if defined(contaminEst) then ("--contamination-estimates '" + contaminEst + "'") else ""} \
      ~{if (defined(reportMaxGLFlag) && select_first([reportMaxGLFlag])) then "--report-genotype-likelihood-max" else ""} \
      -B ~{select_first([genotypingMaxIter, 1000])} \
      --genotyping-max-banddepth ~{select_first([genotypingMaxBDepth, 6])} \
      -W '~{select_first([postIntegrationLim, "1,3"])}' \
      ~{if (defined(excludeUnObsGT) && select_first([excludeUnObsGT])) then "-N" else ""} \
      ~{if defined(gtVarThres) then ("-S " + gtVarThres) else ''} \
      ~{if (defined(useMQFlag) && select_first([useMQFlag])) then "-j" else ""} \
      ~{if (defined(harmIndelQualFlag) && select_first([harmIndelQualFlag])) then "-H" else ""} \
      ~{if defined(readDepFact) then ("-D " + readDepFact) else ''} \
      ~{if (defined(gtQuals) && select_first([gtQuals])) then "-=" else ""} \
      ~{if defined(skipCov) then ("--skip-coverage " + skipCov) else ''}
  >>>
  runtime {
    cpu: select_first([runtime_cpu, 1, 1])
    disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
    docker: "shollizeck/freebayes:1.3.1"
    duration: select_first([runtime_seconds, 86400])
    memory: "~{select_first([runtime_memory, 2, 4])}G"
    preemptible: 2
  }
  output {
    File out = select_first([outputFilename, "generated.vcf"])
  }
}

Common Workflow Language¶

#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: freebayes
doc: |
  usage: freebayes [OPTION] ... [BAM FILE] ...
  Bayesian haplotype-based polymorphism discovery.
  Version:1.3.1

requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: shollizeck/freebayes:1.3.1

inputs:
- id: bams
  label: bams
  doc: Add FILE to the set of BAM files to be analyzed.
  type:
    type: array
    inputBinding:
      prefix: -b
    items: File
  inputBinding: {}
- id: bamList
  label: bamList
  doc: A file containing a list of BAM files to be analyzed.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: -L
- id: reference
  label: reference
  doc: |2-
     Use FILE as the reference sequence for analysis. An index file (FILE.fai) will be created if none exists. If neither --targets nor --region are specified, FreeBayes will analyze every position in this reference.
  type: File
  secondaryFiles:
  - pattern: .fai
  inputBinding:
    prefix: -f
- id: targetsFile
  label: targetsFile
  doc: ' Limit analysis to targets listed in the BED-format FILE.'
  type:
  - File
  - 'null'
  inputBinding:
    prefix: -t
- id: region
  label: region
  doc: |-
    <chrom>:<start_position>-<end_position> Limit analysis to the specified region, 0-base coordinates, end_position not included (same as BED format). Either '-' or '..' maybe used as a separator.
  type:
  - string
  - 'null'
  inputBinding:
    prefix: -r
- id: samplesFile
  label: samplesFile
  doc: |-
    FILE  Limit analysis to samples listed (one per line) in the FILE. By default FreeBayes will analyze all samples in its input BAM files.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: -s
- id: popFile
  label: popFile
  doc: |-
    FILE Each line of FILE should list a sample and a population which it is part of. The population-based bayesian inference model will then be partitioned on the basis of the populations.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --populations
- id: cnvFile
  label: cnvFile
  doc: |-
    FILE Read a copy number map from the BED file FILE, which has either a sample-level ploidy: sample name, copy number or a region-specific format: reference sequence, start, end, sample name, copy number ... for each region in each sample which does not have the default copy number as set by --ploidy.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: -A
- id: outputFilename
  label: outputFilename
  doc: 'FILE Output VCF-format results to FILE. (default: stdout)'
  type:
  - string
  - 'null'
  default: generated.vcf
  inputBinding:
    prefix: -v
- id: gvcfFlag
  label: gvcfFlag
  doc: Write gVCF output, which indicates coverage in uncalled regions.
  type: boolean
  default: false
  inputBinding:
    prefix: --gvcf
- id: gvcfChunkSize
  label: gvcfChunkSize
  doc: ' When writing gVCF output emit a record for every NUM bases.'
  type:
  - int
  - 'null'
  inputBinding:
    prefix: --gvcf-chunk
- id: candidateVcf
  label: candidateVcf
  doc: |2-
     Use variants reported in VCF file as input to the algorithm. Variants in this file will included in the output even if there is not enough support in the data to pass input filters.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: -@
- id: restrictSitesFlag
  label: restrictSitesFlag
  doc: |-
    Only provide variant calls and genotype likelihoods for sites and alleles which are provided in the VCF input, and provide output in the VCF for all input alleles, not just those which have support in the data.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -l
- id: candidateHaploVcf
  label: candidateHaploVcf
  doc: |-
    When specified, only variant alleles provided in this input VCF will be used for the construction of complex or haplotype alleles.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --haplotype-basis-alleles
- id: reportHapAllelesFlag
  label: reportHapAllelesFlag
  doc: |-
    At sites where genotypes are made over haplotype alleles, provide information about all alleles in output, not only those which are called.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --report-all-haplotype-alleles
- id: monomorphicFlag
  label: monomorphicFlag
  doc: |2-
     Report even loci which appear to be monomorphic, and report all considered alleles, even those which are not in called genotypes. Loci which do not have any potential alternates have '.' for ALT.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --report-monomorphic
- id: polyMoprhProbFlag
  label: polyMoprhProbFlag
  doc: |-
    Report sites if the probability that there is a polymorphism at the site is greater than N. default: 0.0. Note that post-filtering is generally recommended over the use of this parameter.
  type:
  - float
  - 'null'
  inputBinding:
    prefix: -P
- id: strictFlag
  label: strictFlag
  doc: Generate strict VCF format (FORMAT/GQ will be an int)
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --strict-vcf
- id: theta
  label: theta
  doc: |-
    The expected mutation rate or pairwise nucleotide diversity among the population under analysis. This serves as the single parameter to the Ewens Sampling Formula prior model default: 0.001
  type:
  - float
  - 'null'
  inputBinding:
    prefix: -T
- id: ploidy
  label: ploidy
  doc: 'Sets the default ploidy for the analysis to N. default: 2'
  type: int
  default: 2
  inputBinding:
    prefix: -p
- id: pooledDiscreteFlag
  label: pooledDiscreteFlag
  doc: |-
    Assume that samples result from pooled sequencing. Model pooled samples using discrete genotypes across pools. When using this flag, set --ploidy to the number of alleles in each sample or use the --cnv-map to define per-sample ploidy.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -J
- id: pooledContinousFlag
  label: pooledContinousFlag
  doc: |-
    Output all alleles which pass input filters, regardles of genotyping outcome or model.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -K
- id: addRefFlag
  label: addRefFlag
  doc: |-
    This flag includes the reference allele in the analysis as if it is another sample from the same population.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -Z
- id: refQual
  label: refQual
  doc: |-
    --reference-quality MQ,BQ  Assign mapping quality of MQ to the reference allele at each site and base quality of BQ. default: 100,60
  type: string
  default: 100,60
  inputBinding:
    prefix: --reference-quality
- id: ignoreSNPsFlag
  label: ignoreSNPsFlag
  doc: Ignore SNP alleles.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -I
- id: ignoreINDELsFlag
  label: ignoreINDELsFlag
  doc: Ignore insertion and deletion alleles.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -i
- id: ignoreMNPsFlag
  label: ignoreMNPsFlag
  doc: Ignore multi-nuceotide polymorphisms, MNPs.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -X
- id: ignoreComplexVarsFlag
  label: ignoreComplexVarsFlag
  doc: Ignore complex events (composites of other classes).
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -u
- id: maxNumOfAlleles
  label: maxNumOfAlleles
  doc: |-
    Evaluate only the best N SNP alleles, ranked by sum of supporting quality scores. (Set to 0 to use all; default: all)
  type: int
  default: 0
  inputBinding:
    prefix: -n
- id: maxNumOfComplexVars
  label: maxNumOfComplexVars
  doc: ''
  type:
  - int
  - 'null'
  inputBinding:
    prefix: -E
- id: haplotypeLength
  label: haplotypeLength
  doc: |-
    Allow haplotype calls with contiguous embedded matches of up to this length. Set N=-1 to disable clumping. (default: 3)
  type: int
  default: 3
  inputBinding:
    prefix: --haplotype-length
- id: minRepSize
  label: minRepSize
  doc: |-
    When assembling observations across repeats, require the total repeat length at least this many bp. (default: 5)
  type: int
  default: 5
  inputBinding:
    prefix: --min-repeat-size
- id: minRepEntropy
  label: minRepEntropy
  doc: |-
    To detect interrupted repeats, build across sequence until it has  entropy > N bits per bp. Set to 0 to turn off. (default: 1)
  type: int
  default: 1
  inputBinding:
    prefix: --min-repeat-entropy
- id: noPartObsFlag
  label: noPartObsFlag
  doc: |-
    Exclude observations which do not fully span the dynamically-determined detection window. (default, use all observations, dividing partial support across matching haplotypes when generating haplotypes.)
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --no-partial-observations
- id: noNormaliseFlag
  label: noNormaliseFlag
  doc: Turn off left-alignment of indels, which is enabled by default.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -O
- id: useDupFlag
  label: useDupFlag
  doc: |-
    Include duplicate-marked alignments in the analysis. default: exclude duplicates marked as such in alignments
  type: boolean
  default: false
  inputBinding:
    prefix: '-4'
- id: minMappingQual
  label: minMappingQual
  doc: |2-
     Exclude alignments from analysis if they have a mapping quality less than Q. default: 1
  type: int
  default: 1
  inputBinding:
    prefix: -m
- id: minBaseQual
  label: minBaseQual
  doc: |2-
     -q --min-base-quality Q Exclude alleles from analysis if their supporting base quality is less than Q. default: 0
  type: int
  default: 0
  inputBinding:
    prefix: -q
- id: minSupQsum
  label: minSupQsum
  doc: |2-
     -R --min-supporting-allele-qsum Q Consider any allele in which the sum of qualities of supporting observations is at least Q. default: 0
  type: int
  default: 0
  inputBinding:
    prefix: -R
- id: minSupMQsum
  label: minSupMQsum
  doc: |2-
     -Y --min-supporting-mapping-qsum Q Consider any allele in which and the sum of mapping qualities of supporting reads is at least Q. default: 0
  type: int
  default: 0
  inputBinding:
    prefix: -Y
- id: minSupBQthres
  label: minSupBQthres
  doc: |2-
     -Q --mismatch-base-quality-threshold Q Count mismatches toward --read-mismatch-limit if the base quality of the mismatch is >= Q. default: 10
  type: int
  default: 10
  inputBinding:
    prefix: -Q
- id: readMisMatchLim
  label: readMisMatchLim
  doc: |2-
     -U --read-mismatch-limit N Exclude reads with more than N mismatches where each mismatch has base quality >= mismatch-base-quality-threshold. default: ~unbounded
  type:
  - int
  - 'null'
  inputBinding:
    prefix: -U
- id: maxMisMatchFrac
  label: maxMisMatchFrac
  doc: |2-
     -z --read-max-mismatch-fraction N Exclude reads with more than N [0,1] fraction of mismatches where each mismatch has base quality >= mismatch-base-quality-threshold default: 1.0
  type:
  - float
  - 'null'
  inputBinding:
    prefix: -z
- id: readSNPLim
  label: readSNPLim
  doc: |2-
     -$ --read-snp-limit N Exclude reads with more than N base mismatches, ignoring gaps with quality >= mismatch-base-quality-threshold. default: ~unbounded
  type:
  - int
  - 'null'
  inputBinding:
    prefix: -$
- id: readINDELLim
  label: readINDELLim
  doc: |2-
     -e --read-indel-limit N Exclude reads with more than N separate gaps. default: ~unbounded
  type:
  - int
  - 'null'
  inputBinding:
    prefix: -e
- id: standardFilterFlag
  label: standardFilterFlag
  doc: |2-
     -0 --standard-filters Use stringent input base and mapping quality filters Equivalent to -m 30 -q 20 -R 0 -S 0
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: '-0'
- id: minAltFrac
  label: minAltFrac
  doc: |2-
     -F --min-alternate-fraction N Require at least this fraction of observations supporting an alternate allele within a single individual in the in order to evaluate the position. default: 0.05
  type:
  - float
  - 'null'
  inputBinding:
    prefix: -F
- id: minAltCount
  label: minAltCount
  doc: |2-
     -C --min-alternate-count N Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position. default: 2
  type: int
  default: 2
  inputBinding:
    prefix: -C
- id: minAltQSum
  label: minAltQSum
  doc: |2-
     -3 --min-alternate-qsum N Require at least this sum of quality of observations supporting an alternate allele within a single individual in order to evaluate the position. default: 0
  type: int
  default: 0
  inputBinding:
    prefix: '-3'
- id: minAltTotal
  label: minAltTotal
  doc: |2-
     -G --min-alternate-total N Require at least this count of observations supporting an alternate allele within the total population in order to use the allele in analysis. default: 1
  type: int
  default: 1
  inputBinding:
    prefix: -G
- id: minCov
  label: minCov
  doc: ' --min-coverage N Require at least this coverage to process a site. default:
    0'
  type: int
  default: 0
  inputBinding:
    prefix: --min-coverage
- id: maxCov
  label: maxCov
  doc: |-
    Downsample per-sample coverage to this level if greater than this coverage. default: no limit
  type:
  - int
  - 'null'
  inputBinding:
    prefix: --limit-coverage
- id: noPopPriorsFlag
  label: noPopPriorsFlag
  doc: |2-
     -k --no-population-priors Equivalent to --pooled-discrete --hwe-priors-off and removal of Ewens Sampling Formula component of priors.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -k
- id: noHWEPriorsFlag
  label: noHWEPriorsFlag
  doc: |2-
     -w --hwe-priors-off Disable estimation of the probability of the combination arising under HWE given the allele frequency as estimated by observation frequency.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -w
- id: noBinOBSPriorsFlag
  label: noBinOBSPriorsFlag
  doc: |2-
     -V --binomial-obs-priors-off Disable incorporation of prior expectations about observations. Uses read placement probability, strand balance probability, and read position (5'-3') probability.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -V
- id: noABPriorsFlag
  label: noABPriorsFlag
  doc: |2-
     -a --allele-balance-priors-off Disable use of aggregate probability of observation balance between alleles as a component of the priors.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -a
- id: obsBiasFile
  label: obsBiasFile
  doc: |2-
     --observation-bias FILE Read length-dependent allele observation biases from FILE. The format is [length] [alignment efficiency relative to reference] where the efficiency is 1 if there is no relative observation bias.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --observation-bias
- id: baseQualCap
  label: baseQualCap
  doc: |2-
     --base-quality-cap Q Limit estimated observation quality by capping base quality at Q.
  type:
  - int
  - 'null'
  inputBinding:
    prefix: --base-quality-cap
- id: probContamin
  label: probContamin
  doc: |2-
     --prob-contamination F An estimate of contamination to use for all samples. default: 10e-9
  type:
  - float
  - 'null'
  inputBinding:
    prefix: --prob-contamination
- id: legGLScalc
  label: legGLScalc
  doc: ' --legacy-gls Use legacy (polybayes equivalent) genotype likelihood calculations'
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --legacy-gls
- id: contaminEst
  label: contaminEst
  doc: |2-
     --contamination-estimates FILE A file containing per-sample estimates of contamination, such as those generated by VerifyBamID. The format should be: sample p(read=R|genotype=AR) p(read=A|genotype=AA) Sample '*' can be used to set default contamination estimates.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --contamination-estimates
- id: reportMaxGLFlag
  label: reportMaxGLFlag
  doc: |2-
     --report-genotype-likelihood-max Report genotypes using the maximum-likelihood estimate provided from genotype likelihoods.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --report-genotype-likelihood-max
- id: genotypingMaxIter
  label: genotypingMaxIter
  doc: |2-
     -B --genotyping-max-iterations N Iterate no more than N times during genotyping step. default: 1000.
  type: int
  default: 1000
  inputBinding:
    prefix: -B
- id: genotypingMaxBDepth
  label: genotypingMaxBDepth
  doc: |2-
     --genotyping-max-banddepth N Integrate no deeper than the Nth best genotype by likelihood when genotyping. default: 6.
  type: int
  default: 6
  inputBinding:
    prefix: --genotyping-max-banddepth
- id: postIntegrationLim
  label: postIntegrationLim
  doc: |2-
     -W --posterior-integration-limits N,M Integrate all genotype combinations in our posterior space which include no more than N samples with their Mth best data likelihood. default: 1,3.
  type: string
  default: 1,3
  inputBinding:
    prefix: -W
- id: excludeUnObsGT
  label: excludeUnObsGT
  doc: |2-
     -N --exclude-unobserved-genotypes Skip sample genotypings for which the sample has no supporting reads.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -N
- id: gtVarThres
  label: gtVarThres
  doc: |2-
     -S --genotype-variant-threshold N Limit posterior integration to samples where the second-best genotype likelihood is no more than log(N) from the highest genotype likelihood for the sample. default: ~unbounded
  type:
  - int
  - 'null'
  inputBinding:
    prefix: -S
- id: useMQFlag
  label: useMQFlag
  doc: |2-
     -j --use-mapping-quality Use mapping quality of alleles when calculating data likelihoods.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -j
- id: harmIndelQualFlag
  label: harmIndelQualFlag
  doc: |2-
     -H --harmonic-indel-quality Use a weighted sum of base qualities around an indel, scaled by the distance from the indel. By default use a minimum BQ in flanking sequence.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -H
- id: readDepFact
  label: readDepFact
  doc: |2-
     -D --read-dependence-factor N Incorporate non-independence of reads by scaling successive observations by this factor during data likelihood calculations. default: 0.9
  type:
  - float
  - 'null'
  inputBinding:
    prefix: -D
- id: gtQuals
  label: gtQuals
  doc: |2-
     -= --genotype-qualities Calculate the marginal probability of genotypes and report as GQ in each sample field in the VCF output.
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: -=
- id: skipCov
  label: skipCov
  doc: |-
    Skip processing of alignments overlapping positions with coverage >N. This filters sites above this coverage, but will also reduce data nearby. default: no limit
  type:
  - int
  - 'null'
  inputBinding:
    prefix: --skip-coverage

outputs:
- id: out
  label: out
  type: File
  outputBinding:
    glob: generated.vcf
    loadContents: false
stdout: _stdout
stderr: _stderr

baseCommand: freebayes
arguments: []

hints:
- class: ToolTimeLimit
  timelimit: |-
    $([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: freebayes_cram